THE RANDOM PAVING PROPERTY FOR 
UNIFORMLY BOUNDED MATRICES 



JOEL A. TROPP 



Abstract. This note presents a new proof of an important result due to Bourgain and Tzafriri 
that provides a partial solution to the Kadison-Singer problem. The result shows that every unit- 
norm matrix whose entries are relatively small in comparison with its dimension can be paved by 
a partition of constant size. That is, the coordinates can be partitioned into a constant number 
of blocks so that the restriction of the matrix to each block of coordinates has norm less than one 
half. The original proof of Bourgain and Tzafriri involves a long, delicate calculation. The new 
proof relies on the systematic use of symmetrization and (noncommutative) Khintchine inequalities 
to estimate the norms of some random matrices. 



1. Introduction 

This note presents a new proof of a result about the paving problem for matrices. Suppose that 
A is an n x n matrix. We say that A has an (rn,e) -paving if there exists a partition of the set 
{1, 2, . . . , n} into m blocks {<7i, o~2, . . . , o~ m } so that 



lv m p 
1 2^=1 ^ 



AP„ 



<e\\A\ 



where P a - denotes the diagonal projector onto the coordinates listed in Oj. Since every projector 
in this note is diagonal, we omit the qualification from here onward. As usual, ||-|| is the norm on 
linear operators mapping to itself. 

The fundamental question concerns the paving of matrices with a zero diagonal (i.e., hollow 
matrices). 

Problem 1 (Paving Problem). Fix e E (0, 1). Is there a constant m = m(e) so that, for sufficiently 
large n, every hollow n x n matrix has an (m,e)-paving? 

Anderson [And79| has shown that the Paving Problem is equivalent with the Kadison-Singer 
problem, a major open question in operator theory. It is closely related to significant problems in 
harmonic analysis and other areas of mathematics and engineering. See [CT06| for an intriguing 
discussion. 

At present, the strongest results on the paving problem are due to Bourgain and Tzafiri |BT91j . 
For a fixed e, they established that 

(1) every hollow matrix of size n x n can be paved with at most m = O(logn) blocks and 

(2) every square matrix whose entries are relatively small compared with its dimension can be 
paved with a constant number of blocks. 
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Let us present a precise statement of their second result. We use the notation [n] = {1, 2, . . . , n}. 

Theorem 2 (Bourgain-Tzafriri). Fix 7 > and e 6 (0,1). There exists a positive integer m = 
771(7, e) so that, for all n > N(-y,£), the following statement holds. Suppose that A is an n x n 
unit-norm matrix with uniformly bounded entries: 

\a jk \ < j, Krrz for j,k = 1,2, ... ,n. 

(logra) 1+ ^ 

Then there is a partition of the set [n] into m blocks {o~\, 02, . . . , o~ m } such that 



Pa AP a 



< £ 



where P aj is the projector onto the coordinates listed in o~j . The number m satisfies the bound 

m < Ce~ c/min{lr/} 
where C is a positive universal constant. 

The proof of this result published in [BT91] hinges on a long and delicate calculation of the 
supremum of a random process. This computation involves a difficult metric entropy bound based 
on some subtle iteration arguments. 

This note shows that the central step in the known proof can be replaced by another approach 
based on symmetrization and noncommutative Khintchine inequalities. This method for studying 
random matrices is adapted from Rudelson's article |Rud99] . Even though it is simple and elegant, 
it leads to sharp bounds in many cases. By itself, Rudelson's technique is not strong enough, so we 
must also also invoke a method from Bourgain and Tzafrari's proof to complete the argument. As 
we go along, we indicate the provenance of various parts of the argument. 

2. Problem Simplifications 

Let us begin with some problem simplifications. The reductions in this section were all proposed 
by Bourgain and Tzafriri; we provide proofs for completeness. 

The overall strategy is to construct the paving with probabilistic tools. The first proposition 
shows that we can leverage a moment estimate for the norm of a random submatrix to build 
a paving. The idea is to permute the coordinates randomly and divide them into blocks. The 
moment bound shows that, if we restrict the matrix to the coordinates in a random block, then it 
has small spectral norm. 

Proposition 3 (Random Paving Principle). Fix an integer m, and let n = km for an integer k. 
Let A be an n x n unit-norm matrix, and suppose that P is a projector onto exactly k coordinates, 
chosen uniformly at random from the set [nj. //, for p > logn, we have the estimate 

{E\\PAP\\ p ) 1/p < e, 

then there exists a partition of the set Jn] into m blocks {a±, 02, • • • , o~m}, each of size k, such that 

Em 

where P aj is the projector onto the coordinates listed in Uy 

Proof. Consider a random permutation ir of the set [n]. For j = 1,2, ... ,m, define 

<Tj(ir) = {7T(jk -k + l),Tr(jk - k + 2), ... , TT(jk)}. 

For each j, the projector P aj (Tr) onto the coordinates in o-j(ir) is a restriction to k coordinates, 
chosen uniformly at random. The hypothesis implies that 

Emaxj = i )2) ..., m ||P £ri ( 7r )AP £ri ( 7r )|| p < me p . 



< 3e 
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There must exist a permutation ttq for which the left-hand side is smaller than its expectation. For 
the partition with blocks o~j = cr,-(7ro), we have 

= max; \\Pn-.APn-. II < m 1 / p e. 



AP„ 



J 3 J 



The equality holds because the coordinate blocks are disjoint. Finally, we have m 1//p < e because 
m < n and p > log n. □ 

This proposition shows that it is sufficient to study the restriction to a random set of coordinates 
of size k. Although this dependent coordinate model is conceptually simple, it would lead to severe 
inconveniences later in the proof. We prefer instead to study an independent coordinate model for 
the projector where the expected number of coordinates equals k. Fortunately, the two models are 
equivalent for our purposes. 

Proposition 4 (Random Coordinate Models). Fix an integer m, and let n = km for an integer 
k. Let A be an n x n matrix. Suppose that P is a projector onto k coordinates, chosen uniformly 
at random from In}, and suppose that R is a projector onto a random set of coordinates from In}, 
where each coordinate appears independently with probability k/n. For p > 0, it holds that 

(E \\PAP\\ p ) 1/p < (2E \\RAR\\ p ) 1/p . 

The reduction to the independent coordinate model also appears in Bourgain and Tzafriri's paper 
with a different proof. The following attractive argument is drawn from [CR061 Sec. 3]. 

Proof. For a coordinate projector R, denote by cr(R) the set of coordinates onto which it projects. 
We can make the following computation: 

P{||JLlRf > t] > V n ¥{\\RAR\\ P > t | #a(R) = j} • P{#a(i?) = j} 
>F{\\RARf>t\#a(R) = k}-T n F{#a(R)=j} 

> ^F{\\PAP\\ P > t}. 

The second inequality holds because the spectral norm of a submatrix is smaller than the spectral 
norm of the matrix. The third inequality relies on the fact |JS68t Thm. 3.2] that the medians of 
the binomial distribution BiNOMiAL(/c/n, n) lie between k — 1 and k. Integrate with respect to t to 
complete the argument. □ 

3. The Main Result 

On account of these simplifications, it suffices to prove the following theorem. In the sequel, R$ 
denotes a square, diagonal matrix whose diagonal entries are independent and identically distributed 
0-1 random variables with common mean 5. The dimensions of R$ conform to its context. 

Theorem 5. Fix 7 > and e G (0, 1). There exists a positive integer m = 771,(7, e) so that, for 
all n > 2V(7,e), the following statement holds. Suppose that A is an n x n unit-norm matrix with 
uniformly bounded entries: 

1 

(logn) 1+ T 

Let 5 = 1/m. For p = 2 • [logn], we have 

{E\\R s ARs\\ p ) 1/p < e. (3.1) 

The number m satisfies the bound 

m < (0.01e)~ 2(1+7)/7 . 



\ajk\ < for j,k = 1,2,... ,77. 



1 



JOEL A. TROPP 



An example of Bourgain and Tzafriri shows that the number 7 cannot be removed from the 
bound (logn)"( 1+7 ) on the matrix entries [BT911 Ex. 2.2]. Fix 5 G (0, 1). For each n > N(S), they 
exhibit an re x re matrix A with unit norm and bounded entries: 

21og(l/<5) 

\djk\ S — ; • 

log n 

For this matrix, E||i2^AJ2^|| > 1/2. In particular, it has no constant-size random paving when e 
is small. 

Proof of Theorem^ from Theorem^ Fix 7 and e. Let m be the integer guaranteed by Theorem 
[SJ and assume that re is sufficiently large. Suppose we are given an re x re matrix with unit norm 
and uniformly bounded entries. If necessary, augment the matrix with zero rows and columns so 
that its dimension is a multiple of m. 

Apply Proposition to transfer the estimate (13.10 to the dependent coordinate model. The 
Random Paving Principle shows that the augmented matrix has an (m, 6e)-paving. Discard the 
zero rows and columns to complete the proof of Theorem [2J □ 

4. Proof of Theorem [5] 

In this section, we establish Theorem [5j The proofs of the supporting results are postponed to 
the subsequent sections. 

Fix 7 > and e G (0, 1). We assume for convenience that n > 8, and we suppose that A is an 
re x re matrix with unit norm and uniformly bounded entries: 

I, 1 dcf 

\ajk\ S j, yrrz = M- 

(logra) i+ ^ 

In the sequel, the symbol u always abbreviates the uniform bound. Finally, set p = 2 ■ [log re] . 
The major task in the proof is to obtain an estimate for the quantity 

E(g)^(E\\R e AR g \n 1/p , 

where g is not too small. This estimate is accomplished with decoupling, symmetrization, and 
noncommutative Khintchine inequalities. This approach is adapted from work of Rudelson |Rud99] 
and Rudelson- Vershynin |RV07j . Given this estimate for E(g), we extrapolate the value of Elm" 1 ) 
for a large constant m = 771(7, e). This step relies on an elegant method due to Bourgain and 
Tzafriri. 

Before continuing, we instate a few more pieces of notation. The symbol ||-|| 12 denotes the 
norm of an operator mapping l\ to i^- For a matrix X expressed in the standard basis, ||-X^||x 2 
is the maximum £2 norm achieved by a column of X. The norm ||-X"|| max calculates the maximum 
absolute value of an entry of X. 

4.1. Step 1: Decoupling. As in Bourgain and Tzafriri's work, the first step involves a classical 
decoupling argument. First, we must remove the diagonal of the matrix. Since the entries of A do 
not exceed /j,, it follows that ||diag A|| < /j. Define 

B = -^— (A-diagA). 
1 + (j, 

Note that B has a zero diagonal and that \\B\\ < 1. Furthermore, 

\b jk \ < \i for j,k = 1,2, . . . ,re. 

With this definition, 

E(g) < ||diagA|| + (l + /i) (E \\R e BR g \\ p ) 1/p . 
The expectation on the right-hand side cannot exceed one, so we have 

E(g) < 2fi + (E \\R s BR e \\ p ) 1/p . 
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Now, we may replace the projector R g by a pair of independent projectors by invoking the following 
result. 

Proposition 6. Let B be a square matrix with a zero diagonal, and let p > 1. Then 



(E \\R e BR e \\ p ) 1/p < 20 (E \\R g BR' g \\ p 



where the two random projectors on the right-hand side are independent. 

See [BT871 Prop. 1.1] or [L7T9T1 Sec. 4.4] for the simple proof. 
We apply Proposition [6] to reach 

E(g) < 2fi + 20 (E\\R g BR' g \\ p y /P . (4.1) 

4.2. Step 2: Norm of a Random Restriction. The next step of the proof is to develop a bound 
on the spectral norm of a matrix that has been restricted to a random subset of its columns. The 
following result is due to Rudelson and Vershynin [RV07] . with some inessential modifications by 
the author. 

Proposition 7 (Rudelson- Vershynin). Let X be annxn matrix, and suppose thatp > 21ogn > 2. 
Then 

(E\\XR s \\ p ) 1/p < 3^p~ (m\\XR e \\l 2 Y /P + y/g\\X\\ . 

The proof of Proposition [7J depends on a lemma of Rudelson that bounds the norm of a 
Rademacher sum of rank-one, self-adjoint matrices |Rud99j . This lemma, in turn, hinges on the 
noncommutative Khintchine inequality [LP861 IBucOlj . See Section [5] for the details. 

To account for the influence of R' , we apply Proposition [7J with X = R g B. Inequality (|4.ip 
becomes 

E(g) <2fi + (E \\R g BR' e \\l 2 ) ^ + 20^ (E \\R g B\\ p ) 1/p . 

We invoke Proposition [7] again with X = B* to reach 

e{q) < 2^ + m^(^\\R e BR^\\ p ^ llp + m^p- (e||b*^||p 2 ) 1/p + 2o^||s*|| . 

Discard the projector R' from the first expectation by means of the observation 

\\ R Q BR 'g\\ lt 2 - W R e B Wi,2 ■ 

In words, the maximum column norm of a matrix exceeds the maximum column norm of any 
submatrix. We also have the bound 

||B*J2 e || 12 < ||B*|| lj2 < < 1 

because the spectral norm dominates the maximum 1% norm of a column. The inequality g < yfg 
yields 

E(g) < 2fi + 60v^(e \\R e B\\{X * + 80^. (4.2) 

4.3. Step 3: Estimate of Maximum Column Norm. To complete our estimate of E(g), we 
must bound the remaining expectation. The following result does the job. 

Proposition 8. Let X be an n x n matrix, and suppose that p > 21ogn > 4. Then 

E\\R e X\\l 2 ) 1/P <3^p\\X\\ max + ^g\\X\\ li2 . 
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The proof of Proposition [8] uses only classical methods, including symmetrization and scalar 
Khintchine inequalities. A related bound appears inside the proof of Proposition 2.5 in |BT91j . 
Turn to Section [6] for the argument. 

Apply Proposition [8] to the remaining expectation in (|4.2p to find that 

E{g) < 2Ai + 18p P ||B|| njM + 60 v ^||B|| 1|2 + 80 v ^. 

As above, the maximum column norm \\B\\i 2 < 1- The entries of B are uniformly bounded by \i. 
Recall p = 2 • [log n\ to conclude that 



E(g) < 550// log n + 250^2 logn, (4.3) 

taking into account [logn] < 1.5 log n whenever n > 8. 

The result in (|4.3p is not quite strong enough to establish Theorem[5l In the theorem, the relation 
between the size m of the paving and the proportion 5 of columns is 5 = 1/m. The parameter 
g also represents the proportion of columns selected. Unfortunately, when we set g = 1/m, we 
find that the bound in (|4.3[) is trivial unless g is smaller than c/logn, which suggests that m 
grows logarithmically with n. To prove the result, however, we must obtain a bound for m that is 
independent of dimension. 

4.4. Step 4: Extrapolation. To finish the argument, we require a remarkable fact uncovered by 
Bourgain and Tzafriri in their work. Roughly speaking, the value of E(g) p is comparable with a 
polynomial of low degree. It is possible to use the inequality (14. 3D to estimate the coefficients of 
this polynomial. We can then extrapolate to obtain a nontrivial estimate of E(8) p , where 5 is a 
small constant. 

Proposition 9 (Bourgain-Tzafriri). Let X be an n x n matrix with \\X\\ < 1. Suppose that p is 
an even integer with p > 2 logn. Choose parameters 5 6 (0, 1) and g £ (0, 0.5). For each A £ (0, 1), 
it holds that 



(E \\RsXRs\\ p ) 1/p < 60 5 A + g~ A (E \\R e XR e \\ p ) 

The proof depends essentially on a result of V. A. Markov that bounds the coefficients of a 
polynomial in terms of its maximum value. See Section [7] for the details. 
Recall now that 

< 1 
^ ~ (log n) 1+ T' 

Set the proportion g = (log n) _1 ~ 27 , and introduce these quantities into (|4.3j) to obtain 

(E\\R e AR e \\ p ) 1/p < 800(logn)-^. 

Proposition [9] shows that 

(E\\R s AR s \\ p ) 1/p < 605 A +48000(log?i) A(1+27) - 7 

for every value of A in (0,1). Make the selection A = 7/(2 + 27). Since the exponent on the 
logarithm is strictly negative, it follows for sufficiently large n that 

{E\\R s AR s \\ p ) 1/p < 100<5 7/(2+27) . 

To make the right-hand side less than a parameter e, it suffices that 5 < (0.01e) 2 ^ 1+7 ^ 7 . Therefore, 
any value 

m > (0.01e)~ 2(1+7)/7 

is enough to establish Theorem [5l 



> _J_ „-A (If? II R YR \\P\VP 
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5. Proof of Random Restriction Estimate 



In this section, we establish Proposition [71 The difficult part of the estimation is performed with 
the noncommutative Khintchine inequality. This result was originally discovered by Lust-Picquard 
[LP86] . We require a sharp version due to Buchholz |Buc01| that provides the optimal order of 
growth in the constant. 

Before continuing, we state a few definitions. Given a matrix X, let cr(X) denote the vector of 
its singular values, listed in weakly decreasing order. The Schatten p-norm || 



is calculated as 



\X\ 



| denotes the £ p vector norm. 



where 

A Rademacher random variable takes the two values ±1 with equal probability. A Rademacher 
sequence is a sequence of independent Rademacher variables. 



Proposition 10 (Noncommutative Khintchine Inequality). Let {Xj} be a finite sequence of ma- 
trices of the same dimension, and let {sj} be a Rademacher sequence. For each p > 2, 



E 



< C p max 



X,X* 



^ 1/2 






1 

Sp 



1/2 



(5.1) 



where C p < 2" - 25 



vr/e^/p. 

This proposition is a corollary of Theorem 5 of [BucOl] . In this work, Buchholz shows that the 
noncommutative Khintchine inequality holds with a Gaussian sequence in place of the Rademacher 
sequence. He computes the optimal constant when p is an even integer: 

(2n)!^ 1/2n 



C 



2n 



2 n n\ 



One extends this result to other values of p using Stirling's approximation and an interpolation 
argument. The inequality for Rademacher variables follows from the contraction principle. 

In an important paper |Rud99j . Rudelson showed how to use the noncommutative Khintchine 
inequality to study the moments of a Rademacher sum of rank-one matrices. 



Lemma 11 (Rudelson). Suppose that xi,X2, 
p > 2 log n, it holds that 

( E E,, 



EjXjXj 



x n are the columns of a matrix X. For any 
< l.by/p\\X\\ 12 ||X|| , 



where {e,} is a Rademacher sequence. 

Proof. First, bound the spectral norm by the Schatten p-norm. 

p\ 1 /p 



E 



En 

i=l £ > 



XjXj 



< E 



\^j=i 

Apply the noncommutative Khintchine inequality to obtain 



EjXjXj 



1/p 



E < C„ 



\Xj\\ 2 XjXj 



1/2 



The rank of matrix inside the norm does not exceed n, so we can bound the Schatten p-norm by 
the spectral norm if we pay a factor of n 1 ^, which does not exceed y/e. Afterward, pull the square 
root out of the norm to find 

i— 1 1 sr~^ n 2 

E < C P Ve||2^ j=1 ll*jll 2 x J x j 



JOEL A. TROPP 



The summands are positive semidefinite, so the spectral norm of the sum increases monotonically 

2 

with each scalar coefficient. Therefore, we may replace each coefficient by max.,- ||scj|| 2 and use the 
homogeneity of the norm to obtain 

1/2 



E < C p \/em.ax.j ||£Cj|| 2 i 



XjXj 



The maximum can be rewritten as ||-X"||i 2 , an d the spectral norm can be expressed as 



E n * 
XjXj 



1/2 



\xx 



*|il/2 



\x\ 



Recall that C p < 2" a25 



Tr/e^/p to complete the proof. □ 

Recently, Rudelson and Vershynin showed how Lemma [TT] implies a bound on the moments of 
the norm of a matrix that is compressed to a random subset of columns |RV07j . 

Proposition 12 (Rudelson- Vershynin) . Let X be a matrix with n columns, and suppose that 
p > 21ogn > 2. It holds that 



(E\\XR e \\ p ) 1/p <3y/p{E\\XR e \\l 2 ) " +Vq\\ X \\- 

In words, a random compression of a matrix gets its share of the spectral norm plus another 
component that depends on the total number of columns and on the ^ 2 norms of the columns. 

Proof. Let us begin with an overview of the proof. First, we express the random compression as a 
random sum. Then we symmetrize the sum and apply Rudelson's lemma to obtain an upper bound 
involving the value we are trying to estimate. Finally, we solve an algebraic relation to obtain an 
explicit estimate for the moment. 
We seek a bound for 



i/p 



E = (E\\XR e \\ p ) 1/p 



First, observe that 



E 1 



E\\XR e X 



*||P/2 



2/p 



where {Qj} is a sequence of independent 0-1 random variables with common mean g. Subtract the 
mean, and apply the triangle inequality (once for the spectral norm and once for the L p / 2 norm): 



E 



XjXj 



In the sum, write g = Eg'- where {g'j} is an independent copy of the sequence {Qj}. Draw the 
expectation out of the norm with Jensen's inequality: 

lv- B t _ >s * p/2 ^ /P 
/ ^j_i^3 QjjXjXj 



E 2 < 



E 



) 



+ g\XX* 



The random variables (Qj — g'-) are symmetric and independent, so we may symmetrize them using 
the standard method, Lemma 6.1 of |LT91j . 

p/2\ 



where {ej} is a Rademacher sequence. Apply the triangle inequality again and use the identical 
distribution of the sequences to obtain 

p/2\ 2 /P 



E z < 2 



En 
j=1 £ jm x j 



) 
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Writing Q = {j : Qj = 1}, we see that 

2/p 

EaXaX; I 

'fi 



E < 2 



E n (E e |V EjXjX* 



+ q\\X\ 



Here, E e is the partial expectation with respect to {sj}, holding the other random variables fixed. 

To estimate the large parenthesis, invoke Rudelson's Lemma, conditional on £1. The matrix in 
the statement of the lemma is XR g , resulting in 



E 2 <3VP 



E ( ||jr.Rg||j 2 ll-^-f^el 



p/2 



2/p 



+ g\\X\\ 2 . 



Apply the Cauchy-Schwarz inequality to find that 



i/p 



E 2 < 3y/p[E\\XR e \\l 2 J " (E\\XR e \\ p ) 1/p + Q\\X\\ Z . 

This inequality takes the form E 2 < bE + c. Select the larger root of the quadratic and use the 
subadditivity of the square root: 



E< b -±^±±< b + V -c 



This yields the conclusion. 



□ 



6. Proof of Maximum Column Norm Estimate 

This section establishes the moment bound for the maximum column norm of a matrix that 
has been restricted to a random set of its rows. We use an approach that is analogous with the 
argument in Proposition [T2l In this case, we require only the scalar Khintchine inequality to 
perform the estimation. Bourgain and Tzafriri's proof of Proposition 2.5 |BT91] contains a similar 
bound, developed with a similar argument. 

Proposition 13. Assume that X has n columns, and suppose p > 21ogn > 4. Then 

E ||^X||^ 2 ) 1/P < 2 1 - 5 V ^||X|| max + v^||X|| 1)2 . 

In words, the B^™,^) norm of a matrix that has been compressed to a random set of rows gets 
its share of the total, plus an additional component that depends on the number of columns and 
the magnitude of the largest entry in matrix. 

Proof. Our strategy is the same as in the proof of Proposition [T2l so we pass lightly over the 
details. Let {Qj} be a sequence of independent 0-1 random variables with common mean q. We 
seek a bound for 

p2 « faun yup \ 2,P fw 1^ i |2 p/2 V /P 

E = \E\\R e X\\1 2 ) = I Emax fc= i,2,,„,n |> Qj \xjk\ J 

In the sequel, we abbreviate q = p/2 and also yjk = \xjk\ 2 - 
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First, center and symmetrize the selectors. In the following calculation, {g'j} is an independent 
copy of the sequence {gj}, and {ej} is a Rademacher sequence, independent of everything else. 



E < [E 



pmax fc 



< (^Emax fc ~ e'j)yjk 9 ) / + 



E 



q\ l/q 

max fc . Sj(gj - Qj)yjk J +Q\\X 



2 

1,2 



< 



2 \Emax k l^.ejQjVjk 



+ Q\\ x \\i 



The first step uses the triangle inequality; the second uses g = E g'j and Jensen's inequality; the 
third follows from the standard symmetrization, Lemma 6.1 of [LT91j . In the last step, we invoked 
the triangle inequality and the identical distribution of the two sequences. 
Next, bound the maximum by a sum and introduce conditional expectations: 



£ 2 <2(E e ^ fc E £ |^.e jW 



+ £||X|| 12 . 



Here, E e denotes partial expectation with respect to {£j}, holding the other random variables fixed. 
Since q > 2, we may apply the scalar Khintchine inequality to the inner expectation to obtain 

/ I q/2\ l / q 

£ 2 <2C,(E e £ fc |£.^] fc J +g\\X\\l 2 , 

where the constant C q < 2°- 25 e _1//2 y/q. The value of the constant follows from work of Haagerup 
[Haa82] . combined with Stirling's approximation. 

Bound the outer sum, which ranges over n indices, by a maximum: 

£ 2 < 2 1 - 2B e- 1 /V/«Vff (E e max fc |£ . g^f^ ^ + g ||X|| 2 j2 . 

Since q > logn, it holds that n l / q < e, which implies that the leading constant is less than four. 
Use Holder's inequality to bound the sum, and then apply Holder's inequality again to double the 
exponent: 



/ \ ' f I <?/ 2 \ /<? 

E 2 < Ay/q\m&xy jk j I E e max fe gjy jk J + £ ||X|| 2 2 

( \ ^ ( I q \ 1/2q 2 

< A^/q( max y jk j ( E g max fc Qjy jk J +^||X|| 2 2 . 



Recall that q = p/2 and that yj k 
right-hand side, so 



\xj k \ . Observe that we have obtained a copy of E on the 



E<<2^^\\X\\ m ^E + g\\X\\l 2 . 

As in the proof of Proposition IT2| we take the larger root of the quadratic and invoke the 
subadditivity of the square root to reach 



£;<2^||X|| max + v ^||X|| li2 
This is the advertised conclusion. 



□ 
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7. Proof of Extrapolation Bound 

This section summarizes the argument of Bourgain and Tzafriri that leads to the extrapolation 
result. The key to the proof is an observation due to V. A. Markov that estimates the coefficients 
of an arbitrary polynomial in terms of its maximum value |Tim63l Sec. 2.9]. 

Proposition 14 (Markov). Let r(t) = Ylt=o c ^ k - The coefficients of the polynomial r satisfy the 
inequality 

d k 

let- 1 < — max \r(t)\ < e max|r(i)|. 
k\ \t\<i 1 Wl - |t|<i 1 v n 

for each k = 0, 1, . . . , d. 

The proof depends on the minimax property of the Chebyshev polynomial of degree d, combined 
with a careful determination of its coefficients. 

Proposition 15 (Bourgain-Tzafriri). Let p be an even integer with p > 21ogn. Suppose that X is 
an n x n matrix with \\X\\ < 1. Choose parameters 5 E (0, 1) and g E (0, 0.5). For each A E (0, 1), 
it holds that 



(E \\R S XR 5 \\ P ) l,p < 60 \5 X + g~ x (E \\R e XR e 
For self-adjoint matrices, the constant is halved. 



\p\1/p 



Proof. We assume that X is self-adjoint. For general X , apply the final bound to each half of the 
Cartesian decomposition 

X + X* i(X-X*) 

X — h 



2 2i 
This yields the same result with constants doubled. 
Consider the function 

F(s) =E\\R S XR S \\ P with0<s<l. 

Note that F(s) < 1 because ||J? s Jti2 s || < \\X\\ < 1. Furthermore, F increases monotonically. 

Next, we show that F is comparable with a polynomial. Use the facts that p is even, that 
p > logn, and that rank X < n to check the inequalities 

F(s) < Etrace(R s XR s ) p < e p F(s). 

It is easy to see that the central member is a polynomial of maximum degree p in the variable s. 
Indeed, one may expand the product and compute the expectation using the fact that the diagonal 
entries of R s are independent 0-1 random variables of mean s. Therefore, 

v 

Etrace(i? s X J R s ) p = ^c k s k 
k=l 

for (unknown) coefficients c\, C2, . . . , c p . The polynomial has no constant term because Rq = 0. 

We must develop some information about this polynomial. Make the change of variables s = gt 2 
to see that 

' < e p F(gt 2 ) < e p F(g) when \t\ < 1. 



\^k=i 



c k g t 



The second inequality follows from the monotonicity of F. The polynomial on the left-hand side 
has degree 2p in the variable t, so Proposition O results in 

\c k \ Q k < e 3p F{g). for k = 1, 2, . . . ,p. 

From here, it also follows that |c^| < e 3p by taking g = 1. 
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Finally, we directly evaluate the polynomial at 5 using the facts we have uncovered. For an 
arbitrary value of A in (0, 1), we have 

^)<|ELi C ^ 

- Z^fc=i 1 K| Z^fc=i+LA P j 1 1 
LApJ 

< e 3p F(£») g~ k + pe 3p <5 Ap 

fc=i 

< 2e 3p g~ Xp F(g) + pe 3p 5 Ap 
since £ < 0.5. Since x \— > x l / p is subadditive, we conclude that 

F(5) x l p < 2 1 / p e 3 • £~ A F(£) 1/P + P 1/P e 3 ' 
A numerical calculation shows that both the leading terms are less than 30, irrespective of p. □ 
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